Impetus Technologies | Data Engineer Interview Experience | 2+ YoE



Round 1: Technical - 1

Spark (PySpark)

🔹 Word Count Problem

1. Modify the code to output the word count such that word frequency is in descending order.

2. Why is reduceByKey used instead of groupByKey?

🔹 What is lineage in Spark?

🔹 Difference between cache and persist in Spark.

🔹 Is fault tolerance the same in Spark and Hadoop?

SQL

🔹 Explain query execution order.

🔹 What are the different types of joins in SQL?

🔹 Explain the difference between DENSE_RANK and RANK.

🔹 What is a cursor in SQL?

🔹 What is a stored procedure in SQL?

Python

🔹 What is a docstring in Python?

🔹 What is pass in Python? When is it used?

🔹 Which data structure occupies more memory: list or tuple? Why?

🔹 Python code to count the frequency of characters in a given text file.

🔹 Python code to create a palindrome with a given number of alphabets.

Example: For n=3 (alphabets: a, b, c) → Palindrome: abcba.

Round 2: Technical - 2

AWS

🔹 What is the Data Catalog in AWS Glue?

🔹 Difference between Athena and Aurora.

🔹 What is versioning in S3?

🔹 What are the different data distribution styles in Redshift?

Projects

🔹 Explain the problem statement of your projects and walk through the implementation details.

Round 3: Managerial

🔹 Describe your past experiences.

🔹 Answer scenario-based questions related to your projects or work environment.

Round 4: HR

🔹 Why are you looking for a change?

🔹 Salary negotiation.

🔹 Overview of the company's operations and the types of projects it undertakes.